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ABSTRACT 

In this paper we compare & variety of modern image compression methods on a large sample of 
astronomical images. We begin by demonstrating from first principles how the amount of noise in 
the image pixel values sets a theoretical upper limit on the lossless compression ratio of the image. 
We derive simple procedures for measuring the amount of noise in an image and for quantitatively 
predicting how much compression will be possible. We then compare the traditional technique 
of using the GZIP utility to externally compress the image, with a newer technique of dividing 
the image into tiles, and then compressing and storing each tile in a FITS binary table structure. 
This tiled-image compression technique offers a choice of other compression algorithms besides 
GZIP, some of which are much better suited to compressing astronomical images. Our tests on 
a large sample of images show that the Rice algorithm provides the best combination of speed 
and compression efficiency. In particular, Rice typically produces 1.5 times greater compression 
and provides much faster compression speed than GZIP. 

Floating point images generally contain too much noise to be effectively compressed with 
any lossless algorithm. We have developed a compression technique which discards some of the 
useless noise bits by quantizing the pixel values as scaled integers. The integer images can then 
be compressed by a factor of 4 or more. 

Our image compression and uncompression utilities (called fpack and funpack) that were 
used in this study are publicly available from the HE AS ARC web site. Users may run these 
stand-alone programs to compress and uncompress their own images files. 

Subject headings: image compression, FITS format, fpack 


1. Introduction 

As the size of astronomical data archives con- 
tinues to increase at ever greater rates, it is in the 
interests of both data providers and data users to 
make use of the most effective possible file com- 
pression techniques. Compressing the data re- 


duces the storage media costs and also reduces the 
network bandwidth needed to transmit the files to 
users. In principle, compressing the data may also 
reduce data analysis times because the software 
needs to transfer fewer bytes to or from local disks. 

In this paper we investigate the state of the art 
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in lossless compression of astronomical images by 
comparing the performance of different compres- 
sion techniques. In particular, we show how noise 
in astronomical images is the main limiting factor 
in the amount of lossless compression that can be 
obtained. We begin in the following section by de- 
scribing a new tiled-image compression format and 
the compression algorithms that are used in this 
study. Then in section 3 we quantitatively demon- 
strate how the amount of noise in an image can 
be used to derive the expected compression ratio 
and show that synthetic images, containing known 
amounts of noise, closely follow this expected re- 
lationship. In sections 4 and 5 we compare how 
actual 16-bit and 32-bit integer astronomical im- 
ages compress with the different algorithms. Then 
in section 6 we discuss why lossless compression of 
floating point images is usually not cost effective, 
and present an alternative method which produces 
much better compression by discarding some of 
the noise in the pixel values. Finally, section 7 
discusses the effect that the tiling pattern has on 
the compression performance, and section 8 sum- 
marizes the main results of this study. 

2. Compression Methods 

In this study we use a new compressed im- 
age format that is based on the FITS tiled-image 
compression convention (Pence et al. 2000; Sea- 
man et al. 2007). Under this convention, the 
image is first divided into a rectangular grid of 
“tiles”. Usually the image is tiled on a row by 
row basis, but any other rectangular tile size may 
be specified. Each tile of pixels is then com- 
pressed using one of several available compression 
algorithms (described below), and the compressed 
stream of bytes is stored in a variable length ar- 
ray column in a FITS binary table. Each row of 
the FITS binary table corresponds to one tile in 
the image. Our software uses the CFITSIO li- 
brary (Pence 1999) to transparently read and 
write these compressed files as if they were ordi- 
nary FITS images, even though they are physically 
stored in a table format. One of the advantages 
of using this tiled image convention, compared to 
the other common technique of externally com- 
pressing the entire FITS image, is that the com- 
pressed FITS image is itself a valid FITS file and 
the image header keywords remain uncompressed. 
This enables much faster read and write access to 


the metadata keywords that describe the image. 
Another advantage is that since each image in a 
multi-extension FITS file is compressed individu- 
ally, it is not necessary to uncompress the entire 
file just to read a single image. Also, if only a 
small section of the image is being read, only the 
corresponding tiles need to be uncompressed, not 
the entire image. 

At present, the implementation of this con- 
vention in the CFITSIO supports 4 lossless com- 
pression algorithms: Rice, Hcompress, PLIO, and 
GZIP. The main features of each of these algo- 
rithms are described below. 

Rice: The Rice algorithm (Rice, Yeh & Miller 
1993; White & Becker 1998) is very simple (ad- 
ditions, subtractions, and some bit masking and 
shifts), making it computationally efficient. In 
fact, it has been implemented in hardware fox- 
use on spacecraft and in embedded systems, and 
has been considered for use in compressing images 
from future space telescopes (Nieto-Santisteban et 
al. 1999). In its usual implementation, it encodes 
the differences of consecutive pixels using a vari- 
able number of bits. Pixel differences near zero are 
coded with few bits and large differences require 
more bits. The lengths of the codes are optimal 
when the difference of adjacent pixels has an expo- 
nential probability distribution (which turns out 
to be common in most images). There is a single 
parameter for the codes that adapts to the noise 
by determining the number of pure noise bits to 
strip off the bottom of the difference and include 
directly in the output bitstream (with no coding). 
The best value for this noise scale is computed 
independently for each block of 16 or 32 pixels. 
With such short blocks, the algorithm requires lit- 
tle memory and adapts quickly to any variations 
in pixel statistics across the image. 

Hcompress: The Hcompress image compres- 
sion algorithm w T as written to compress the Space 
Telescope Science Institute digitized sky survey 
images (White et al. 1992). This method involves 
(1) a wavelet transform called the H-transform (a 
Haar transform generalized to two dimensions), 
followed by (2) an optional quantization that dis- 
cards noise in the image while retaining the signal 
on all scales, followed by (3) a quadtree coding of 
the quantized coefficient bitplanes. In this study 
we omitted the quantization step, which makes 
Hcompress lossless. The H-transform computes 
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sums and differences within pixel blocks, starting 
with small 2x2 blocks and then increasing by fac- 
tors of two to 4x4, 8x8, etc., blocks. This is an 
exactly reversible, integer arithmetic operation, so 
a losslessly encoded set of the H-transform coeffi- 
cients can be uncompressed and inversely trans- 
formed to recover the original image. The H- 
transform can be performed in-place in memory 
and requires enough memory to hold the original 
image (or image tile). To avoid overflow problems 
when summing the pixel values, the memory ar- 
ray is expanded by a factor of 2 so that each pixel 
has twice as many bits as in the original image. 
The Hcompress bitplane coding, which proceeds 
by first compressing the most significant bit of 
each coefficient (mostly zeros) and working down 
to the least significant bit (usually random noise), 
has the effect of ordering the image description so 
that the data stream gives a progressively better 
approximation to the original image as more bits 
are received. This was used to create an efficient 
adaptive scheme for image transmission (Percival 
& White 1996). 

PLIO: The IRAF (Tody 1993) Pixel List I/O 
(PLIO) algorithm was developed to store integer 
image masks in a compressed form. This special- 
purpose run-length encoding algorithm is very ef- 
fective on typical masks consisting of isolated high 
or low values embedded in extended regions that 
have a constant pixel value. Our implementation 
of this algorithm only supports pixel value in the 
range 0 to 2 23 . Because of the specialized nature of 
the PLIO algorithm, we only discuss its use with 
compressing data masks, in section 4.3. 

GZIP: The popular GZIP file compression util- 
ity (Gailly &; Adler 1992) works by building a dic- 
tionary of repeated sequences of bytes occurring in 
the input and using a short code for each sequence. 
The most important distinguishing characteristic 
of GZIP compared to the other compression al- 
gorithms used in this study is that GZIP treats 
each 8-bit byte of the input data stream as an in- 
dependent datum, whereas the other compression 
methods operate on the numerical value of the in- 
put image pixels as multi-byte quantities. This 
puts GZIP at a distinct disadvantage when com- 
pressing astronomical images with 16-bit or 32-bit 
pixel values. Since GZIP does not use the numer- 
ical value of the pixels, it cannot use knowledge 
of the approximate value of the next pixel to im- 


prove the compression. As a result, it becomes 
less effective when increasing noise makes repeated 
patterns less common. 

It should be noted that the GZIP algorithm has 
a user-selectable parameter, with a value ranging 
from 1 to 9, for fine tuning the trade off between 
speed and compression ratio. A value of 1 gives 
the fastest compression at the expense of file size 
and 9 gives the highest compression at the expense 
of speed. Using the fastest value of 1 instead of 
the default value of 6 for this parameter generally 
increases the speed by a large factor while only 
increasing the compressed file size by a few per- 
cent, therefore we have used a value of 1 in all the 
speed comparison tests in this study. One small 
side effect of using this fastest compression speed, 
however, is that it increases the subsequent image 
uncompression time by about 10%. 

Within this study, GZIP is used in 2 different 
processing contexts which have significantly differ- 
ent compression speeds. In the first context, the 
GZIP program on the host computer is used to 
externally compress the FITS image, and in the 
other context the GZIP algorithm is used within 
the FITS tiled image convention to compress each 
image tile. The numerical algorithm is identi- 
cal in both cases, however the host GZIP pro- 
gram only takes about half as much CPU time 
as the tiled GZIP method to compress the same 
image. This difference is mainly due to the fact 
that the host GZIP program can more efficiently 
read and write the input and output files as se- 
quential streams of bytes, whereas the tiled image 
compression method requires random access to the 
FITS files, which in turn requires that the input 
and output data be copied to intermediate storage 
buffers in memory. As will be demonstrated later, 
in spite of this extra processing overhead the tiled 
Rice algorithm can still compress images several 
times faster than the host GZIP program. 

3. The Effect of Noise on Lossless Image 
Compression 

The fundamental principle that ultimately lim- 
its the amount of lossless image compression is 
the fact that random noise is inherently incom- 
pressible. In this section we demonstrate how this 
principle can be extended to quantitatively under- 
stand how the amount of noise in an image sets a 
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theoretical upper limit on the lossless image com- 
pression ratio. 

In order to study the effects of noise on image 
compression, it is important to be able to accu- 
rately measure the amount of noise in any image. 
After some experimentation, we found that an al- 
gorithm that was originally developed to measure 
the signal-to-noise in spectroscopic data (Stoehr 
et al. 2007) serves our needs well. In particular, 
we adopted the 3rd order “median absolute differ- 
ence" formula to compute the standard deviation 
of the pixel values in each row of the image: 

a = 0.6052 x median(— Xi -2 + 2x i — 2 ) (1) 

where 2 is the value of the pixel 2 spaces to the 
left of pixel i, and Xi+ 2 is the value of the pixel 2 
spaces to the right, and the median value is com- 
puted over all the pixels in each row of the im- 
age. The use of the median in this formula makes 
the result insensitive to the presence of outlying 
large pixel values. The noise value for the im- 
age as a whole is then computed from the mean 
of these median values for every row of the im- 
age. In the limit where the image contains pure 
Gaussian-distributed noise, this formula converges 
to give the same value as the standard deviation 
of all the pixel values. 

It is easier to understand how the noise affects 
image compression by considering a hypothetical 
image with pixels that have BITPIX bits (where 
BITPIX is usually 8, 16, or 32 in astronomical im- 
ages) and where the lowest N bits of each pixel 
are 100% dominated by noise and the higher or- 
der BITPIX — N bits are completely noise- free. The 
N noise bits are by definition totally incompress- 
ible, so the theoretical maximum compression ra- 
tio, even if all the remaining bits are compressed 
to 0 , is given simply by 

R = orig_size/comp_size = BITPIX/A^s (2) 

In practice, no actual algorithm can infinitely com- 
press the non-noise bits, and instead can only com- 
press them, on average, down to K bits per pixel. 
This K parameter can be viewed as a measure of 
the efficiency of a compression algorithm, where 
better algorithms have smaller values of K. The 
actual compression ratio can then be expressed as 

R - BITPIX/ (Nbits + K) (3) 


To illustrate, if a 16-bit image containing 4 bits 
of noise per pixel is compressed with an algorithm 
that has K = 2, the compression ratio that one 
can expect is R = 16/6 = 2.7 As will be shown 
later, the best compression algorithms have K ~ 
1, and thus are able to compress all the non-noise 
bits in each image pixel down to about 1 bit on 
average. 

In real images, the noise is not neatly confined 
to the lowest order bits, and instead, there is a 
gradual transition from the least significant bit 
that is most dominated by noise, to the more sig- 
nificant bits that successively contain less noise. 
We can calculate the “equivalent” number of pure 
noise bits per pixel in this case from the back- 
ground pixel noise, <7, given by Equation 1. If the 
image pixel values have a Gaussian noise distribu- 
tion, then the equivalent number of noise bits in 
each pixel (as derived in the attached appendix) 
is given by 

Nbits = log 2 (<7\/l2) = log 2 (cr) + 1.792 (4) 

Substituting this into equation 3 then gives 

R = BITPIX/ (log 2 (a) + 1.792 + K) (5) 

For example, a 16-bit image with a = 30 has 
log 2 (30) + 1.792 = 6.7 equivalent noise bits, and 
the expected compression ratio will be about 2.1 
for an algorithm that has K — 1. 

To verify that real compression algorithms fol- 
low this expected relationship between compres- 
sion ratio and noise, we generated 2 sets of syn- 
thetic FITS images containing differing amounts 
of random noise. In the first set, each image con- 
tained N bits of pure noise such that the least sig- 
nificant N bits of each pixel value (where N ranged 
from 0 to BITPIX - 1) were randomly assigned a 
value of 0 or 1 and all the more significant bits were 
set to 0. In the second set of synthetic images, the 
pixel values had a Gaussian random distribution, 
with a ranging from 1.0 to 500. The effective num- 
ber of noise bits in this second set of images was 
then calculated from Equation 4. 

We then measured the compression ratio, /?, 
of each of these synthetic images when using the 
3 general-purpose tiled-image compression algo- 
rithms, Rice, GZIP. and Hcompress. Figure 1 
shows the resulting plot of BITPIX//? (i.e., the 
average number of compressed bits per pixel) as 
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Fig. 1. — Plot of compressed bits per pixel ver- 
sus the number of noise bits in 16-bit synthetic 
images. The solid lines represent the images that 
have Ntits of pure noise, and the symbols represent 
the images that have Gaussian distributed noise, 
where N^ts is calculated from Equation 4. The 
solid circles are for Rice compression, the open cir- 
cle for Hcompress, and the triangle are for GZIP. 



Fig. 2. — Same as Figure 1, except for 32-bit in- 
teger synthetic noise images. 


a function of the number of actual or equivalent 
noise bits in the 16-bit images. This coordinate 
frame was chosen because lines with constant K 
have a slope of 1.0 and a Y-intercept — A", as 
shown by the dashed lines in the figure. The solid 
lines in the figure represent the images with N bits 
of pure noise, and the circles or triangles represent 
the other set of synthetic images with Gaussian- 
distributed noise, where the equivalent number of 
noise bits is calculated from Equation 4. The cor- 
responding compression ratio, A, is shown by the 
horizontal dashed lines. 

It can be seen in Figure 1 that the Rice and 
Hcompress algorithms do have the expected slope 
of 1, and they have constant K values, indepen- 
dent of the amount of noise in the image, of about 
1.2 and 0.9, respectively. The close agreement be- 
tween the lines (derived from the pure noise syn- 
thetic images) and corresponding points (derived 
from the Gaussian noise images) for these 2 com- 
pression algorithms confirms the validity of Equa- 
tion 4 for computing the equivalent number of 
noise bits in images. Close inspection shows that 
there is a slight flattening of the slope of these re- 
lations for 1 < Nfots < 5, which can be attributed 
mainly to the fact that there is a small amount of 
fixed disk space “overhead” required to store the 
compressed images in a FITS binary table struc- 
ture, and this overhead becomes relatively more 
significant as the size of the compressed image de- 
creases. (See also the discussion in the appendix 
of the non-linear behavior at small values of TV). 

It is also quite apparent that GZIP behaves very 
differently than Rice or Hcompress in Figure 1. 
GZIP cannot be parameterized with a single value 
of A, and instead it ranges from about 2 to 5, 
depending on the amount and distribution of the 
noise in the image. Unlike the other 2 algorithms, 
GZIP does not compress the 2 types of synthetic 
noise images equally well; it is more effective com- 
pressing the images in which the noise is confined 
to the lowest N bits. It is interesting that K ap- 
pears to reach a maximum at Ntits = 8 which is 
where the noise propagates into the more signifi- 
cant byte of the 2-byte pixel values. This differ- 
ence can probably be attributed to the fact that 
GZIP interprets each pixel as 2 independent 8-bit 
bytes whereas Rice and Hcompress treat each 16- 
bit pixel value as a single integer number. 

The equivalent plot for the synthetic 3 2- bit in- 
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teger images is shown in Figure 2. The relations 
for Rice and Hcompress are virtually identical to 
those for 16-bit integers shown in Figure L and 
in particular, the K values are the same. This is 
expected because the size of the compressed image 
when using Rice or Hcompress only depends on the 
amount of noise, not on the byte size of the pixels. 
Thus if a 16- bit and 32-bit image have the same 
dimensions and have the same of amount of noise, 
the compressed files will be identical in size and 
hence the compression ratio of the 32-bit image 
will be exactly twice that of the the 16-bit image. 
As was the case with 16-bit images, GZIP behaves 
quite differently because of the way it interprets 
the image as a stream of bytes. The compression 
efficiency is much worse than Rice or Hcompress, 
with a variable K value that approaches a value 
of 8 for for the noisiest images. 

Implicit in the above discussion is the fact that 
the compression ratio does not depend on the 
mean value of the pixels in the image; adding or 
subtracting a constant offset to all the pixels has 
no effect on the compression. Similarly, storing 
the image as signed or unsigned integers makes no 
difference. This is self-evident for the Rice and 
Hcompress algorithms since they operate only on 
the differences between pixels, not their absolute 
values. GZIP is also unaffected because the fre- 
quency distribution of different byte patterns is 
largely unchanged by applying a constant offset 
to the pixels. 

4. Compression of 16-bit Astronomical 
Images 

In contrast to the tests in the previous section 
using synthetic noise images, we now examine how 
w'ell the different compression methods perform on 
actual 16-bit integer astronomical images. The 
primary data set used in these tests are the im- 
ages that were taken during the night of 27 - 28 
July 2006 at Cerro Tololo Inter- American Obser- 
vatory using the Mosaic CCD camera. This cam- 
era contains 8 individual CCD detectors, and each 
detector has 2 amplifiers that read out half of the 
chip each. Every exposure with this camera re- 
sults in a FITS file containing 16 image extensions 
that are each 1112 by 4096 pixels in size. In total, 
this data set consists of 102 FITS files containing 
1632 separate FITS image extensions. To comple- 


ment this large set of images taken with a single 
instrument, we also collected a small sample of 
other 16-bit integer images taken w r ith other in- 
struments that were available from various public 
astronomical data archives. 

We compressed and uncompressed each of these 
images using the Rice, GZIP, and Hcompress al- 
gorithms supported by our tiled-image compres- 
sion software, and in each case recorded the com- 
pression ratio, the calculated equivalent number of 
noise bits (from Equation 4), and the elapsed com- 
pression and uncompression CPU times. We also 
measured these same parameters when using the 
GZIP program on the host computer to externally 
compress and uncompress the images. These host 
GZIP tests were performed on a single FITS image 
extension instead of on the whole multi-extension 
file, to be comparable with the tiled-image com- 
pression tests which also operate on a single image 
extension at a time. 

4.1. Compression Ratio versus Noise 

One of most striking results of this study is 
shown in Figure 3, which plots the compression ra- 
tio using the Rice algorithm versus the measured 
number of equivalent noise bits in each image. The 
compression ratio R, is plotted, rather than the 
reciprocal quantity BITPIX/R that was use in the 
previous figures, because R is the quantity of more 
direct interest to most users. 

The Mosaic camera CCD images (plotted with 
small + symbols) very closely follow the same 
curve as derived from the synthetic images that 
contain only noise (showm by the gray line). The 
somewhat surprising conclusion that can be drawn 
from this close agreement is that the compression 
ratio is almost solely dependent on the amount of 
background noise in the image. The actual im- 
age content, i.e., all the stars and galaxies seen 
in the images, has practically no influence on the 
compression ratio. The sample of 15 other CCD 
images taken with other instruments, as shown 
by the larger circles, also mainly follows the same 
curve. Only 2 of these points, plotted with open 
circles, have a significantly lower compression ra- 
tio than expected. Inspection of these 2 images 
showed that they contain such an unusually dense 
pattern of overlapping star images that it ad- 
versely affects the compression efficiency. In the 
great majority of cases, however, the image con- 


6 




Nbits Noise 

Fig. 3. — The Rice compression ratio as a function 
of the amount of noise in 16-bit integer images. 
The gray line is derived from the synthetic noise 
images. 
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Fig. 4. — The compression ratios for the different 
algorithms as a function of the noise in the 16- bit 
images (upper panel). The solid line shows the 
theoretical upper limit for an ideal compression 
algorithm. The middle and lower panels compare 
the Rice ratio to that of Hcompress and GZIP. 
respectively. 


tent has very little effect on the amount of lossless 
compression. 

It is worth noting that much of the scatter for 
the Mosaic camera images seen in this figure is 
due to slight differences between the 16 image de- 
tectors. The insert in Figure 3 shows a magnified 
section of the data in which it can be seen that 
the points tend to lie along distinct bands that 
correspond to the different detectors. Each detec- 
tor has unique characteristics in its fixed pattern 
noise which cause the noise in the pixel values to 
not be truly randomly distributed. This causes 
a slight systematical bias in the noise calculation 
(from Equation 1), producing small horizontal dis-. 
placements of the points from each detector in the 
figure. Thus, most of the apparent scatter is the 
result of the superposition of the different detec- 
tors. 

One other interesting thing to note in Figure 3 
is that the different types of Mosaic camera im- 
ages are segregated into different regions of the 
plot. The “bias” frame images that have 0 expo- 
sure time all have less than 4.5 equivalent bits of 
noise. The short calibration exposures occupy the 
middle region of the plot, with 4.5 to 6.5 bits of 
noise. Finally, the deep sky exposures, as well as 
the heavily exposed flat field images, have more 
than 6.5 bits of noise. This is a natural conse- 
quence of the fact that the different types of im- 
ages have characteristically different mean count 
levels. Since the noise in a photon counting type 
detector scales as the square root of the number 
of detected counts, the different types of images 
will also have distinctly different noise levels. It is 
a somewhat perverse fact of nature that the more 
scientifically interesting astronomical images tend 
to have the most noise and hence have the worst 
compression factors. 

4.2. Comparison of Different Compression 
Algorithms 

Next we compare the file compression ratios 
and speed of the different compression methods. 
Figure 4 shows the compression ratios achieved 
by Rice, Hcompress, and tiled-GZIP plotter] as a 
function of the equivalent number of noise bits for 
the 1632 Mosaic camera CCD images. 

As can be seen, Rice and Hcompress achieve 
very similar compression ratios that are both much 
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higher than that produced by tiled-GZIP. The 
middle panel of the plot compares Rice and Hcom- 
press, showing that Hcompress produces about 2% 
to 5% better compression. As discussed below, 
this small gain is usually not cost effective because 
of the much higher CPU times needed by Hcom- 
press compared to Rice. The lower panel shows 
that Rice produces about 1.5 times better com- 
pression than tiled-GZIP for images with low to 
moderate amounts of noise and about 1.3 times 
better compression for the noisiest images. 

The solid curve in the top panel of Figure 4 
shows the theoretical maximum compression ra- 
tio, given by BITPIX / N^ts, that would be pro- 
duced by an ideal algorithm that compresses all 
the non- noise bits in the image to zero size (i.e., 
an algorithm with K = 0 in Equation 3). By 
comparison, Rice and Hcompress have K values 
of about 1.2 and 0.9 bits per pixel, respectively, 
which means that they already achieve about 75% 
to 90% (depending on the noise level) of the the- 
oretically best possible compression. Thus, it is 
not possible for any new lossless compression algo- 
rithm to produce dramatically better compression 
for astronomical data than is already achieved by 
Rice and Hcompress. 

The relative compression and uncompression 
speeds 1 of the different methods are shown in Fig- 
ures 5 and 6. When compressing images, Rice is 
2 to 3 times faster than Hcompress, depending on 
the amount of noise, and 4 to 6 times faster than 
tiled-GZIP (or 2.5 to 3.4 times faster than host- 
GZIP). And when uncompressing images, Rice is 
2 to 3 times faster than Hcompress, and 1.6 to 2 
times faster than tiled-GZIP (or close to the same 
speed as host-GZIP). 

The mean compression ratios and the relative 
compression and uncompression CPU times for 
the 1632 Mosaic camera images when using the 
different compression methods are summarized in 
Table 1, where the CPU times in each case are rel- 
ative to the time when using the Rice algorithm. 


All the timing measurements in this paper are based on 
CPU times and not on the total elapsed processing times. 
In principle the elapsed time should be somewhat greater 
than the CPU time because it includes the latency times 
need to read or write data on magnetic disk. In practice, 
however, the sophisticated data caching systems on modern 
disks and computer systems tends to minimize these I/O 
bottlenecks, to the point where they are difficult measure 
consistently. 
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Fig. 5. — Relative CPU time needed to compress 
16-bit integer FITS images using the GZIP (top) 
or Hcompress (bottom) algorithms as a function 
of the image noise level. The times are relative 
to the time when using the Rice algorithm. The 
horizontal banding of the points is due to the finite 
time resolution of the CPU measurements. 
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Fig. 6. — Relative CPU time needed to uncom- 
press 16-bit integer FITS images using the GZIP 
(top) or Hcompress (bottom) algorithms as a func- 
tion of the image noise level. The times are rel- 
ative to the time when using the Rice algorithm. 
The horizontal banding of the points is due to the 
finite time resolution of the CPU measurements. 
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The uncompression speeds are arguably more im- 
portant than the compression speeds because in 
many instances an image only has to be com- 
pressed once (by the data provider) but it must 
be uncompressed by every user, sometimes mul- 
tiple times if the analysis software directly reads 
the image in its compressed form. The actual com- 
pression or uncompression speed depends of course 
on the underlying computing system. As a bench 
mark reference, one of our current Linux machines 
with a 2.4 GHz AMD Opteron 250 dual core pro- 
cessor (using only one of the processors) can tile 
Rice compress a 50 MB 16-bit integer FITS image 
in 1 second. Uncompressing this image also takes 
about 1 second. 

4.3. Special case: Data Masks 

Data mask images are often used in data anal- 
ysis as a means of flagging special conditions that 
affect the corresponding pixels in an associated as- 
tronomical image. Typically, a small fraction of 
the pixels have one of a limited set of positive 16- 
bit integer values, and the remaining large major- 
ity of pixels are all equal to 0 or 1. These data 
masks have so little noise that they do not follow 
the general relationship between noise and com- 
pression ratio seen in more typical astronomical 
images. The compression ratio is not limited by 
the amount of noise, and instead is limited by the 
intrinsic internal overheads associated with each 
compression algorithm and with the FITS tiled- 
image file structure itself. With the Rice algo- 
rithm, for example, each block of 32 image pix- 
els (= 64 bytes) can, in the extreme case where 
all the pixels have the same value, be compressed 
to a single 4-bit code value for a maximum com- 
pression ratio of 128. A similar analysis of the 
other available algorithms shows that the maxi- 
mum compression ratio is about 200 for GZIP and 
about 700 for both Hcompress and PLIO. These 
maximum compression ratios are only achieved 
with relatively large image tiles sizes. For smaller 
tiles, less than a few thousand bytes in size, the 
compression algorithms become less efficient, and 
other fixed-size overheads, such as the 8- bytes per 
tile needed to store the byte offset and size of each 
tile in the FITS binary table format, become rel- 
atively more significant. Even so, the data masks 
still usually compress by a factor of 50 or more. 

In many cases, achieving the very highest pos- 


sible compression of data masks is of little practi- 
cal benefit because the size of the compressed data 
mask image becomes insignificant compared to the 
rest of the associated data set. Each data mask 
image has an uncompressed FITS header that is 
at least a few thousand bytes in size, and each 
data mask is usually paired with an equal-sized 
astronomical image that itself can usually only be 
compressed by a factor of 2 - 3. Thus, the size of 
the compressed data mask is only a small percent- 
age of the total data set size, and reducing the size 
even further makes very little practical difference. 
The compressed data masks are essentially ‘free’ 
for data providers because they take up almost no 
disk storage space compared to the rest of the data 
set. 

Since the compression ratio is less of a fac- 
tor in choosing which algorithm to use, the speed 
of the algorithm becomes a more significant con- 
sideration. Rice and PLIO are the fastest algo- 
rithms when compressing data masks, but GZIP 
and Hcompress are less than a factor of 2 slower, 
depending slightly on tile size. Overall, PLIO pro- 
vides the best combination of compression ratio 
and speed when compressing data masks, but in 
practice it may be more convenient to use the same 
compression algorithm on the data mask as is used 
on the associated astronomical image. 

5. Compression of 32-bit Integer Images 

In order to measure the performance of the dif- 
ferent compression methods on 32-bit integer im- 
ages, we obtained a sample of FITS images taken 
with the NEWFIRM near-infrared camera during 
the night of 24 - 25 February 2008 at Kitt Peak Na- 
tional Observatory. The NEWFIRM instrument 
contains a mosaic of 4 imaging detectors, each of 
which is 2112 by 2048 pixels in size. There are 
447 NEWFIRM observations in our data sample, 
giving a total of 1788 separate images. 

5.1. Comparison of Different Compression 
Algorithms 

We repeated the previous tests to measure the 
compression ratios and the CPU times required to 
compress and uncompress each of the 32-bit in- 
teger NEWFIRM images using the different com- 
pression methods. Figure 7 shows how the Rice 
and tiled-GZIP compression ratios vary as a fune- 
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Table 1 

16 -bit integer image compression 



Rice 

Hcompress 

Tiled-GZIP 

Host-GZIP 

Compression Ratio 

2.11 

2.18 

1.53 

1.64 

Compression CPU time 

1.0 

2.8 

5.6 

2.6 

Uncompression CPU time 

1.0 

3.1 

1.9 

0.85 
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Fig. 7 32-bit integer image compression ra- 
tio for the Rice (upper points) and GZIP (lower 
points) algorithms plotted as a function of the 
noise level in the image. The analogous points for 
the Hcompress algorithm are not shown because 
they fall nearly on top of the Rice points. The gray 
lines going though these points are derived from 
synthetic images in which the pixel values have a 
constant level plus a known amount of Gaussian 
distributed noise. The points are segregated onto 
4 distinct bands, which correspond to the 4 differ- 
ent detectors in the NEWFIRM mosaic camera. 


tion of the measured number of equivalent noise 
bits in each image. (The points for the Hcompress 
algorithm have been omitted for clarity in this fig- 
ure because they lie only slightly above the Rice 
points). This figure is similar to the corresponding 
Figure 4 for 16-bit integer images, except that the 
compression ratios are about twice as large, given 
the same amount of image noise. This is a natu- 
ral result of the fact that a 32-bit integer image is 
twice as large as a corresponding 16-bit image, but 
the compressed size of the image, at least when us- 
ing Rice or Hcompress, only depends on the noise, 
not on the intrinsic bit-length of the pixels. One 
important consequence of this fact is that there is 
no disk space penalty in storing FITS images as 
32-bit integer arrays instead of 16-bit integer ar- 
rays, because the Rice or Hcompress compressed 
images are identical in size. It still does require 
slightly more CPU time to compress or uncom- 
press the 32-bit representation of the image, how- 
ever. 

This 2:1 relationship in compression ratios does 
not hold for the GZIP algorithm because GZIP op- 
erates on the individual bytes in the image data, 
not on the numerical 2-byte or 4-byte integer pixel 
values. The presence of the 2 higher order bytes 
in the 32-bit image degrades the compression effi- 
ciency when using GZIP even if those 2 bytes are 
equal to zero in every pixel. Thus, the compres- 
sion ratio of a 32-bit image when using GZIP is 
only about 1.6 times greater then that of a 16-bit 
image with the same noise level. 

It can also be seen in Figure 7 that the points 
are segregated into 4 distinct bands that corre- 
spond to the 4 different detectors in the NEW- 
FIRM mosaic camera. Unlike the CCD detectors 
in the MOSAIC camera, which are very closely 
matched in image quality and noise characteris- 
tics. the 4 infrared imaging devices in the custom- 
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built NEWFIRM camera have distinctive charac- 
teristics. In particular, one of the detectors ap- 
pears to have significantly less noise than in the 
other 3 detectors (i.e., the points are shifted to 
the left). This is at least partially due to the fact 
that there is a systematic offset between the mean 
background level in the even and odd numbered 
columns in this detector. Since our noise estima- 
tion algorithm (see section 3) depends on the dif- 
ferences between every other pixel in each row, 
this under estimates the true pixel-to-pixel noise 
variation in the images taken with this detector. 

As was the case for 16-bit images, the different 
types of images fall in different regions of Figure 7. 
The clump of points with Nbits < 5 and with Rice 
compression factors greater than 5 correspond to 
‘bias’ calibration exposures that were taken with 
a closed shutter. The points with 5 < Nbits < 7 
and Rice compression factors of about 4.5 corre- 
spond to short exposures of calibration stars, and 
the remaining points with larger noise values and 
Rice compression factors of ~ 3.5 correspond to 
the deep sky images as well as the heavily exposed 
flat field images. 

For comparison, the upper solid curve in Figure 
7 shows the maximum possible compression ratio 
that would be achieved by an ideal compression 
algorithm that has K = 0. Hcompress and Rice 
have K values of = 0.9 and 1.2, respectively, ex- 
actly the sames as with 16-bit integers, and thus 
are again within 70% to 90% of this theoretical 
limit, depending on the amount noise in the im- 
age. 

Figures 8 and 9 compare the CPU times re- 
quired to compress and uncompress the 32-bit in- 
teger images with GZIP or Hcompress, relative to 
the time required when using the Rice algorithm. 
These are analogous to Figures 5 and 6 for 16- 
bit images. The average compression ratios and 
the relative compression and uncompression CPU 
times for all 1788 images are also summarized in 
Table 2. As can be seen, the speed advantage of 
Rice over Hcompress or tiled-GZIP is even greater 
when compressing or uncompressing 32-bit images 
than with 16-bit images. Our bench mark Linux 
machine (2.4 GHz AMD Opteron 250 dual core 
processor), can Rice-compress a 90 MB 32-bit in- 
teger FITS image in about 1 second and can un- 
compress the same image in about 1.2 seconds. 

Over all. Rice is clearly the best lossless com- 
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Fig. 8. — CPU time needed to compress 32-bit 
FITS images using the GZIP (top) or Hcompress 
(bottom) algorithms as a function of the image 
noise level. The times are relative to the time 
to compress the same image using the Rice algo- 
rithm. 
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Fig. 9. — CPU time needed to uncompress 32-bit 
FITS images using the GZIP (top) or Hcompress 
(bottom) algorithms as a function of the image 
noise level. The times are relative to the time 
to compress the same image using the Rice algo- 
rithm. 
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Table 2 

32-bit integer image compression 



Rice 

Hcompress 

Tiled-GZIP 

Host-GZIP 

Compression Ratio 

3.76 

3.83 

2.30 

2.32 

Compression CPU time 

1.0 

5.2 

7.8 

4.7 

Uncompression CPU time 

1.0 

3.4 

2.2 

1.3 


pression method for 32-bit integer images. In the 
case of a typical deep-sky image with 8 equiva- 
lent bits of noise, Rice produces 1.6 times bet- 
ter compression than tiled-GZIP, and has 8 times 
the compression speed and 2 times the uncompres- 
sion speed. Compared with Hcompress, Rice is 5 
and 3.5 times faster, respectively, when compress- 
ing and uncompressing images, which more than 
makes up for the slight 2% difference in compres- 
sion ratio. 

5.2. Special case: Representing Floating- 
Point Images as Scaled Integers 

Instead of representing floating-point pixels di- 
rectly in images, a widely used FITS convention 
converts the floating-point values into scaled inte- 
gers, where the (approximate) floating point value 
is then given by 

reaLvalue = BSCALE x integer_value-bBZERO (6) 

and where BSCALE and BZERO are linear scaling 
constants given as keywords in the header of the 
FITS image. This is technically a ‘lossy’ compres- 
sion technique that quantizes all the pixel values 
into a set of discrete levels, spaced at intervals of 
1/BSCALE. Ideally, the quantization levels should 
be spaced finely enough so as to not lose any sci- 
entific information in the image, but without also 
preserving excessive amounts of noise. 

Unfortunately, a common practice is to simply 
compute the BSCALE value so that the minimum 
and maximum pixel values are scaled to span the 
full 32-bit dynamic range of the scaled integer im- 
age. This has the effect of magnifying the noise in 
the floating-point array by a huge factor, which 
makes the scaled integer array virtually incom- 
pressible. 

In order to achieve higher compression, a better 
technique, as first described by White Sz Green- 


field (1999), is to choose the BSCALE value so 
that the quantized levels are spaced at some rea- 
sonably small fraction of the noise in the image, 
such that, 

spacing = 1/BSCALE — a/D (7) 

and where cr is calculated from Equation 1. The 
number of noise bits per pixel that are preserved in 
this case can be calculated from Equation 4 and is 
simply log 2 (D) T 1.792. In order to achieve the 
best compression, data providers should choose 
the smallest value of D that still preserves the 
required scientific information in the compressed 
image. This will depend on the particular appli- 
cation, but previous experiments (see for example 
Figure 2 in White & Greenfield, 1999) suggest that 
values of D in the range of 10 to 100 may be ap- 
propriate. 

6. Compression of Floating-point Images 

FITS images that have 32-bit floating-point 
pixel- values are more challenging to compress than 
integer images for 2 reasons. Firstly, many com- 
pression algorithms, including Rice and Hcom- 
press, can by design only operate on integer data. 
Secondly, floating-point pixel values often con- 
tain a large amount of noise which greatly hin- 
ders compression. Since a 32-bit floating-point 
value can record about 6.5 decimal places of pre- 
cision, whereas most astronomical images rarely 
have more than 3 decimal places of significance 
per pixel, this means that many of the lower or- 
der bits in each pixel value effectively just contain 
incompressible random noise. As a result, even al- 
gorithms like GZIP which can losslessly compress 
floating-point data (since it treats each byte as an 
independent datum) are not very effective. To test 
this, we collected a sample of 17 floating-point im- 
ages from public astronomical archives taken with 
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different instruments. Of these, 13 of the images 
compressed very poorly with GZIP as expected, 
with low compression factors ranging from 1.08 
to 1.25. Surprisingly, the other 4 images did quite 
well, with compression factors of 2 to more than 5. 
Closer inspection showed that these are not typi- 
cal floating-point images: the pixel values in these 
images are quantized into a limited set of discrete 
levels and therefore effectively behave more like in- 
teger images in so far as compression is concerned. 

Given these difficulties, it is generally not cost 
effective to losslessly compress floating-point FITS 
images. Therefore, we use the technique discussed 
in section 5.2 which converts the floating-point val- 
ues into scaled 32-bit integers and then compresses 
them using the Rice (by default) algorithm. The 
linear scaling parameters are calculated indepen- 
dently for each tile of the image, and the BSC ALE 
value is derived based on the amount of noise in 
each tile to preserve only a user-specified number 
of noise bits in the compressed image. By dis- 
carding the remaining noise bits, the image com- 
pression ratio can be dramatically increased. It is, 
however, incumbent upon the user to determine 
the appropriate number of noise bits to be pre- 
served so as to not degrade the scientific usefulness 
of the image. As a rough guide, we have found that 
retaining 6 to 8 bits of noise in the scaled integer 
image is often sufficient. As can be seen from Fig- 
ure 7, this will result in image compression ratios 
of about 4 when using the Rice algorithm. 

7. Effect of Tiling Pattern on Compression 
Performance 

There are a number considerations in choosing 
an appropriate tiling pattern when compressing an 
image. First, the tile must be sufficiently large for 
the compression algorithm to operate efficiently. 
For the Rice algorithm, the lower limit is about 
500 pixels; for GZIP it is about 2000. Below these 
levels, the compression time for the image and the 
size of the compressed file both begin to increase. 
The Hcompress algorithm is inherently different 
from Rice and GZIP in that the wavelet transform 
only operates on 2-dimensional arrays of data. At 
a minimum it requires tiles containing at least 4 
row's of the image, and it reaches near maximum 
efficiency when the tiles contain about 16 rows. 
For this reason we adopted 16 rows of the image at 


a time as the default tiling pattern in our software 
when using Hcompress. 

The other main consideration when choosing a 
tile size is how the software that reads the image 
will access the pixels. The 2 most common access 
methods used by astronomical software are either 
to read the entire array of pixels in the image into 
computer memory all at once, or to read the im- 
age sequentially one row at a time. In the first 
case, the specific tiling pattern makes very little 
difference because the reading routine simply has 
to uncompress each and every tile in the image 
once and pass the array of uncompressed pixels 
back to the application program. 

If the application program reads the image one 
row at a time, then the tiling pattern can have a 
major effect on the reading speed. If each tile con- 
tains multiple rows of the image (and in the limit, 
the whole image could be compressed as one big 
tile), then the FITS file reading routine has to un- 
compress the whole tile in order to extract just a 
single row. It would obviously be very inefficient 
to repeatedly uncompress the same tile each time 
the application program requests the next row of 
pixels. Instead, a recommended implementation 
strategy is to temporarily store the most recently 
accessed uncompressed tile in memory, so that it is 
immediately available in case the application pro- 
gram reads more pixels from that same tile. This 
caching technique adds some computational over- 
head, however, so in general the default single row 
tiling pattern is the most efficient for applications 
that read an image row by row. 

A third type of image access occurs in applica- 
tions that read a rectangular ‘cutout’ from a much 
larger compressed image. In this case it can be 
efficient to use a rectangular tile pattern that ap- 
proximates the size of the typical cutout. Only 
those tiles that overlap the cutout region will then 
have to be uncompressed. This tiling pattern may 
be grossly inefficient however, for software that ac- 
cesses the image one row' at a time, unless a fairly 
sophisticated caching mechanism is implemented 
to store all the uncompressed tiles along a row. 

In summary, the default row by row' tiling pat- 
tern (or 16 rows at a time in the case of Hcom- 
press) should work w'ell in most situations. The 
main exception is if the images are very small, in 
which case it may be more efficient to compress 
multiple row's, or the entire image, as a tile. 
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8. Summary 

In this paper we have demonstrated how the 
presence of random noise in an image almost com- 
pletely determines how much the image can be 
losslessly compressed. The average number of 
noise bits per pixel in an image can be accurately 
derived from the Gaussian sigma of the pixel vari- 
ations in background regions of the image. Since 
these random noise bits are inherently incompress- 
ible, the maximum possible lossless compression 
ratio, in the ideal case where all the remaining 
non-noise bits are compressed to zero, is simply 
given by the number of bits per pixel divided by 
the average number of noise bits per pixel. In prac- 
tice of course, no actual compression algorithm 
can achieve this ideal amount of compression, and 
instead can only compress the non- noise bits down 
to some some finite value of K bits per pixel. The 
K value of each algorithm can be empirically mea- 
sured and can be used to rank the compression 
efficiency of the algorithms. The most efficient al- 
gorithms used in this study are Hcompress and 
Rice, which have K values of 0.9 and 1.2 bits per 
pixel, respectively. When compressing a typical 
integer CCD image with 8 equivalent noise bits 
per pixel, these algorithms achieve close to 90% 
of the maximum possible amount of compression 
that would be achieved by a ideal algorithm with 
K = 0. The Rice algorithm in particular is also ex- 
ceptionally fast, so it is deemed unlikely that any 
new algorithm that may be developed in the fu- 
ture will be able to match its combination of speed 
and compression efficiency. 

We use a relatively new FITS format convention 
for storing compressed images in which the image 
is divided into rectangular tiles (usually row by 
row) and then each tile is compressed and stored 
in a row of a FITS binary table structure. The 
main advantages of this compression format over 
the common technique of externally compressing 
the whole FITS file with a file compression tool like 
GZIP are, (1) the FITS header keywords remain 
uncompressed for fast read and write access. (2) 
small sections of an image can be read without 
having to uncompress the whole image, and (3) a 
single image extension in a multi-extension FITS 
file can be read without uncompressing the entire 
file. 

This tiled image compression technique of- 


fers a choice of different compression algorithms. 
The Rice and Hcompress algorithms are generally 
much more effective then GZIP at losslessly com- 
pressing astronomical images because they operate 
on the numerical 16 or 32 bit pixel values rather 
than just treating them as sequence of indepen- 
dent 8-bit bytes. As a result, Rice and Hcompress 
produce about 1.3 times more compression than 
GZIP with 16-bit integer images, and about 1.6 
times more with 32-bit images. Although Rice and 
Hcompress produce similar amounts of compres- 
sion, Rice is about 3 times faster and is therefore 
recommended for general use. Rice is also much 
faster than GZIP when compressing images and 
has about the same speed when uncompressing 
them. 

We compared the various compression methods 
on a large sample of astronomical images obtained 
from NO AO as well as a smaller sample from other 
major observatories. Almost all these images fol- 
lowed the expected compression ratio versus noise 
relationship, with remarkably little scatter. This 
demonstrates that the actual content of the image, 
i.e., the stars and galaxies or other image features, 
have almost no effect on the compression ratio. We 
only found a few cases where the density of stellar 
images was so great that it adversely affected the 
amount of compression. 

One interesting result from this comparison 
is the fact that the different types of astronom- 
ical images contain characteristically different 
amounts of noise and therefore have distinctly 
different compression ratios. The amount of noise 
correlates strongly with exposure time or the mean 
count level in the image, so the short exposure 
calibration images typically have less noise and 
compress better than deep images of the sky. 

Finally, we investigated compression techniques 
for 32-bit floating-point astronomical images. Due 
to the large fraction of noise bits in these images, 
lossless compression is generally not cost effective. 
Instead, we developed a technique where the pixel 
values are converted to scaled 32-bit integers be- 
fore compression. This is not a lossless compres- 
sion technique, since quantizing the pixel values in 
this way discards some of the noise. When used 
properly, this technique will preserve the scientific 
integrity of the image, but by discarding some of 
the random noise it will give much higher com- 
pression ratios of about 4 or more, instead of only 
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about 1.2 when losslessly compressing the image 
with GZIP. 

One of the main conclusions from this study is 
that the current lossless compression techniques 
(in particular, the tiled image Rice compression 
method) are very close to achieving the theoreti- 
cal maximum limit that is set by the amount of 
noise in the images. Very little added benefit can 
be gained from developing better lossless compres- 
sion algorithm. Instead, the only way to make sig- 
nificant improvements in astronomical image com- 
pression is to use lossy techniques which discard 
some of the incompressible noise without degrad- 
ing the scientific information in the images. 

The general purpose FITS tiled-image compres- 
sion and uncompression software tools used in this 
study are publicly available from the HEASARC 
web site. These tools are distributed as part of 
the CFITSIO library package and can also be 
downloaded from a dedicated web page at the 
HEASARC web site. The programs are called 
{pack and {unpack and are invoked on the com- 
mand line to compress or uncompress an input list 
of FITS images, analogous to the gzip and gunzip 
utilities. Various user parameters can be specified 
on the command line to select which compression 
algorithm to use and to specify the desired image 
tile sizes. There is also a ‘-T’ test option which 
can be used to generate a report which compares 
the compression ratio and speed of all available 
compression algorithms on the specified input im- 
ages. More extensive information about using the 
fpack and funpack utility programs is available in 
the companion users guide that is included in the 
distribution. 
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A. Derivation of the Equivalent Number of Noise Bits 

For images with a Gaussian noise distribution (for instance, the readout floor of a CCD), we derive the 
equivalent number of noise bits. Start by assuming N bits of uniform noise and average over the range of 
data numbers (x = DN) for the expected values of x and x 2 : 

x = [k(k 4- l)/2]/2 N (E of DN series, with k = 2 A — 1) 

= ( 2 n — l)/2 

and 

x 2 = [k(k -f 1)(2 k -j- l)/6\/2 N (E of series of squares) 

= ( 2 n - l)(2 Ar+1 - l)/6 

Solve for the variance, 

cr 2 = x 2 — x 2 

= ( 2 2JV - 1)/12 

In the limit of large N : 

cr = 2 N /Vl2 

Solving for N then gives, 

Mbits = log 2 (crv / 12) 

= log 2 (cj) + 1.792 


The factor of l/y/l2 can be identified as the familiar analog-digital quantization noise (Janesick 2001). 
This can be derived with continuous variables by integrating the second moment of a stepwise probability 
density symmetrically over 2 N quanta. The discrete derivation above makes the non-linear limiting behavior 
at small values of N evident. 

Figures 1 and 2 demonstrate empirical agreement of synthetic data with the N^ts relation for both 16-bit 
and 32-bit integer pixels. Figure 3 and 7 empirically confirm this relation for real world optical and infrared 
data sets. 
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